Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add containers/tei/{cpu,gpu}/1.6.0 #132

Merged
merged 6 commits into from
Jan 3, 2025

Conversation

alvarobartt
Copy link
Member

Description

This PR adds a new container for TEI v1.6.0 just released (see the release notes at https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.6.0).

The main feature on TEI v1.6.0 w.r.t. TEI v1.5.0 is that it now supports multiple CPU backends, not just ONNX, meaning that it can also serve embedding models on CPU with backends other than ONNX (since not every model on the Hub comes with an ONNX-converted version of the weights). Some other features include the addition of the General Text Embeddings (GTE) heads, the implementation of MPNet, fixes around the health checks, and much more.

Note

Note that this PR also includes the changes from the https://github.com/huggingface/text-embeddings-inference/releases/tag/v1.5.1 release.

To inspect the changes required to make the TEI container work in GCP, see the diff at:

Copy link
Member

@philschmid philschmid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@philschmid
Copy link
Member

How does this CPU multibackend work? Does it check if there are *.onnx weights and if so use them if not use normal pytorch + candle?

@alvarobartt
Copy link
Member Author

How does this CPU multibackend work? Does it check if there are *.onnx weights and if so use them if not use normal pytorch + candle?

Yes, it tries to download the ONNX weights before, otherwise it rolls back to using safetensors, attaching here the code where the backend is initialized as a reference 👍🏻

https://github.com/huggingface/text-embeddings-inference/blob/57d8fc8128ab94fcf06b4463ba0d83a4ca25f89b/backends/src/lib.rs#L199-L295

@alvarobartt
Copy link
Member Author

alvarobartt commented Jan 3, 2025

Also @philschmid, see the logs below as reference on how do those look when running on CPU with a model from the Hub without the ONNX converted weights e.g. ibm-granite/granite-embedding-125m-english.

image

One minor nit within the logs is that it claims to have downloaded the onnx/model.onnx file as of the following messages (also seen in the screenshot above):

2025-01-03T09:31:22.895547Z  INFO text_embeddings_backend: backends/src/lib.rs:401: Downloading `model.onnx`
2025-01-03T09:31:22.916581Z  WARN text_embeddings_backend: backends/src/lib.rs:405: Could not download `model.onnx`: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/ibm-granite/granite-embedding-125m-english/resolve/main/model.onnx)
2025-01-03T09:31:22.916595Z  INFO text_embeddings_backend: backends/src/lib.rs:406: Downloading `onnx/model.onnx`
2025-01-03T09:31:22.936811Z  INFO text_embeddings_backend: backends/src/lib.rs:218: Model ONNX weights downloaded in 41.26313ms

But that's not true and can be missleading since it tries to initialize the ONNX backend when the file's not there cc @OlivierDehaene for reference (happy to open this or contribute to it within the TEI repository if needed!)

@alvarobartt alvarobartt merged commit 1c31c51 into main Jan 3, 2025
1 check passed
@alvarobartt alvarobartt deleted the upgrade-text-embeddings-inference branch January 3, 2025 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants